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Abstract 

Consider estimation of the regression parameter in the accelerated 
failure time model, when data are obtained by cross sectional sam- 
pling. It is shown that it is possible under regularity of the model to 
construct an efficient estimator of the unknown Euclidean regression 
parameter if the distribution of the covariate vector is known and also 
if it is unknown with vanishing mean. 
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1 Introduction 

The model most frequently used in survival analysis is the Cox Proportional 
Hazards (PH) model, which is also called the Cox regression model; see Cox 
(1972). Let T be the survival time, and W a vector of covariates of dimension 
k. Given W = w, the Cox model is determined by the hazard rate of T, 

\ e (t\w) = e eTw X{t), t > 0, (1.1) 

where 6 6 G is an unknown fc-vector of parameters. The baseline hazard 
function A corresponds to a survival function G = 1 — G via, 

A« = ||, i>0, (1.2) 

where G is an absolutely continuous distribution function with density g. For 
this model without restrictions on the baseline hazard function, there exists 
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an explicit asymptotically efficient estimator for 6. This partly explains the 
popularity of this model. Efficiency of Cox's estimator is proved e.g. in 
Tsiatis (1981). This efficiency holds uniformly in the cumulative hazard 
function; see Klaassen (1989). 

Another model used in survival analysis is the closely related Accelerated 
Failure Time (AFT) model, given by 

T = e~ eTw V, (1.3) 

where V is a nondegenerate random variable on [0, oo) with unknown hazard 
function A and where V and W are independent both of each other and of 9. 
In this scale model A serves as baseline hazard. The conditional hazard rate 
of T given W = w, is given by 

X e (t\w) = e dTw X(e eTw t), t > 0. (1.4) 

This conditional hazard function scales the baseline hazard function in time 
depending on the covariates. So the effect of the covariates is to accelerate 
or decelerate the aging process, thus influencing the time of failure, depend- 
ing on the relevant characteristics of the individual. Both the PH and the 
AFT model are extensively discussed in for instance Kalbfleisch and Pren- 
tice (1980). Despite the extremely frequent application of the PH model in 
practice, even Sir David Cox himself claims that '. . . accelerated life models 
are in many ways more appealing [than proportional hazard models] because 
of their quite direct physical interpretation . . . '; See Reid (1994), p. 450. 
Furthermore, it turns out that the AFT model is technically more tractable 
than the PH model under cross sectional sampling. 

To observe the (possibly censored) survival times of a group of individuals 
can be quite time consuming and expensive. An alternative is cross sectional 
sampling. That is, at a specific point in time an i.i.d. random sample of 
fixed size n is taken, containing the survival times (X±, . . . ,X n ) from onset 
up to this point and their corresponding covariate vectors (Z 1 , . . . , Z n ). The 
distribution of the survival times in such a sample typically differs from the 
distribution of the real survival times. On the one hand, individuals with 
a longer survival time have a higher probability of being sampled. Here it 
is assumed that the density of the real survival time under cross sectional 
sampling at y is proportional to y times the density of the real survival time 
in the core model at y (length bias); see Van Es, Klaassen, and Oudshoorn 
(2000, (A. 7)). On the other hand, the observations are censored multiplica- 
tively, that is, if Y represents the real survival time of an individual in the 
sample and the random variable X represents his observed survival time, 
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then it is assumed that the time point of sampling is uniformly distributed 
over the whole survival period of lenght Y, so 



X = UY, 

with U and Y independent and U uniformly distributed on [0,1]. 

A model under cross sectional sampling is completely determined by the 
set of all possible distributions of the real survival time T and the covariate 
vector W. This set is called the core model. Let h denote the density of the 
covariates in the core model with respect to a measure v and let G(-\w) be 
the conditional survival function in the core model, given W — w. Then the 
joint density of X and Z equals 

f(x,z) = G{x ^ {z \ x>0,zeTZ k . (1.5) 

This has been shown by Van Es et al. (2000, p. 305). 

For the AFT model the formulas for the joint density of the observed 
survival time X and the corresponding covariate vector Z, the marginal 
density of Z and the conditional density of X given Z = z, are (Van Es et 
al. (2000, p. 303)) 

, , \ G(e eTz x)h(z) , . . 

fe(x,z)= \ > K \ x>0,zeK k , (1.6) 

-e T z-Li \ 

foA*) = Ehe JJ , (1.7) 
fe(x\z)= \ x>0,zeTZ k (1.8) 

respectively, with E g V = f vg(v)dv, E h e~ 0Tw = f e~ eTw h(w)du(w) , and 
E g ,/iT = E g VE h e~ dTw . Note that the conditional density f)1.8j) determines 
a scale model. However, the whole model is not an AFT model because den- 
sity (jl.7j) of Z depends on 9. Also note that this marginal density describes a 
parametric model if the density h of the covariate vector in the core model is 
known. In the PH model the corresponding marginal density of Z describes 
a genuine semiparametric model. 

For the case where the density h is known there exists an asymptotically 
efficient estimator of 9. This has been conjectured in Van Es et al.(2000) 
with a sketch of a proof. In the following sections a complete proof will be 
given and it will be extended, under conditions, to the case of an unknown 
core distribution of the covariates. 
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In Section 12.11 the AFT-model for cross sectional data with known core 
distribution of the covariate vector is defined more rigorously together with 
an outline of the concepts from semiparametric statistics needed for the sub- 
sequent sections. For a general survey of semiparametric models, see Bickel, 
Klaassen, Ritov & Wellner (1993), from now on referred to as BKRW. In Sec- 
tion 12121 the existence of a y^-unbiased consistent estimator of the efficient 
influence function of the model is proved and the existence of a y^-consistent 
estimator of the parameter vector will be shown. It is concluded then that 
an efficient estimator for the parameter vector exists. The proofs provide 
the tools for the construction of such an estimator. The model is extended 
in Section 3 to the situation where the density of the covariates in the core 
model has mean zero but is unknown otherwise. Under regularity conditions 
on this density, again the existence of an efficient estimator of the parameter 
vector is proved. The applied approach is absed on the identifiability of the 
regression parameter if only the covariates would have been observed. This 
identifiability does not hold anymore if the distribution of the covariates in 
the core model is completely unknown. Consequently, different techniques 
will be required then to estimate the regression parameter efficiently 

2 Known core distribution of the covariates 
2.1 Model representation 

Let Q be a convex set of distribution functions G of the continuous random 
variable V such that their corresponding density functions g with respect to 
Lebesgue measure // on (0, oo) satisfy 

(CI) E G V = J vg{v)dv < oo, 

(C2) E G V^\{V) = j v^dv < ^ 

where the hazard rate A corresponds to g via (jl.2j) . Let 7i be a collection 
of core density functions h of W with respect to the dominating measure v 
such that 

(C3) the covariance matrix E w = E h {(W - EW)(W - EW) T } exists and is 
nonsingular. 

Given h G H, let the parameter space 6^ C R k be chosen such that the 
following conditions are satisfied: 

(C4) E h e~ eTw = j e- eTw h{w)du{w) < oo, \/6 G Q h , 
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(C5) E h \W\ 2 e- 9Tw < oo, V0 G Q h . 

Conditions (CI) and (C4) ensure that (jl.6j) - (jl.8|) really are densities. Con- 
ditions (C3) and (C5) are explained by the following lemma. 

Lemma 2.1. Fix h G 7i and 9 G Q h . If Z has density (jl.7j) and if conditions 
(C3) and (C5) hold, then the covariance matrix of Z is nonsingular. 

Proof. By condition (C5) the covariance matrix is well-defined. Assume 
Yjz is singular. Then there exists an nonzero a G R fc such that S^a = 0. 
This implies a T S^a = E(a T (Z — EZ)) 2 = 0. So there exists a b G R such 
that a T Z = b a.s. With A = {z G R fc : a T z 7^ b} this means 

= P(a T Z ? b) = J fa(z)M*) = ^J-eTw J ^ z h{z)dv{z). 

This yields f. h(z)di/(z) = P(a T W 7^ b) = and hence a T W = b a.s. or 
a T (W-EW) = a.s. Consequently Ea T (W -EW)(W -EW) T = a T E w = 
holds, which contradicts the nonsingularity in condition (C3). □ 

Fix the density h. The model from which the cross sectional data 
(Xi, . . . , X n , Zi, . . . , Z n ) are drawn is represented by 

V:={P e , G :6ee h ,Geg}, (2.1) 

where Pg t Q is a distribution with density fg = dPe t c/d(fi x u), as given in 
p.6|) . with respect to the product measure fi x z/. Estimation of # with G, and 
later also h, as infinite dimensional nuisance parameters will be considered. 
For fixed Pq = Pe ,G define the submodels 

V 1 := {P e>Go : 6 G Qh} (2.2) 

and 

V 2 := {P 6o ,g :Geg}. (2.3) 

By (C1)-(C5) and p.fj|l and by absolute continuity of the distribution 
function Go the submodel V\ is regular parametric. This may be verified as 
in the proof of Proposition 2.1.1 of BKRW. The condition in this proposi- 
tion of continuous differentiability of fg is stronger than necessary. Absolute 
continuity of Go suffices; cf. Example 2.1.2 of BKRW. 

For simplicity in later calculations define 

Y = Y e = e eTz X. (2.4) 



5 



With the subscript 9 suppressed, the variables Y u i — 1, . . . , n, are defined 
in the same way. These variables have density 

g Y (y) := G(y)/E g V (2.5) 

independent of 9 and they have score for location equal to the baseline hazard 
function X(y). Notice that EY\(Y) = 1 holds and that in view of (jl.Hj) the 
random variables Y and Z are independent. For convenience the formulas 
for functions that depend on X and Z will be given in terms of Y and Z 
instead. 

The tangent space V of V at Pq is defined as the closed linear span of 
the tangent spaces of all parametric paths through Pq. This definition easily 
extends to the submodels V\ and V%. The tangent space V\ of V\ is given by 
the linear span of the score function l x for 9, because V\ is parametric. This 
score function equals 

h(X,Z) =EZ-ZY\ (Y). (2.6) 

The tangent space Vi of V2 is harder to determine. However, calculation 
of V2 can be sidestepped. It will be shown that there exists a parametric 
submodel of P 2 and a model that contains V2 such that the projections of 
the score function l\ on the respective models are equal. Therefore, they also 
equal the projection of l\ on P 2 , and hence the efficient score function defined 
below can be calculated. A parametric submodel with the same information 
bound as V is called least favorable. 

Definition 2.1. The efficient score function l\ e (^(-Po))^ for 9 in the full 
model V at P = Pe ,G * s defined by 

^ = /1 - IIo(/i|A), (2-7) 

that is, by the score function for 9 minus its componentwise orthogonal pro- 
jection on the linear subspace V2 of the tangent space of the nuisance param- 
eter. The efficient Fisher information for 9 in the presence of the unknown 
nuisance parameter G at P in V is defined by 

I(P Q \G,V) = Elllf (2.8) 

and the efficient influence function by 

l x = {El* x lf)- X l* x . (2.9) 

The inverse I~ 1 (Po\9,V) of the Fisher information matrix is called the infor- 
mation bound. 
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For every parametric path G v through Go = G, the joint density (jl.6j) is 
a function of Y = exp(# T Z)X . Therefore the tangent space V2 consists of 
square integrable functions of Y with zero mean under gy, that is 

V 2 C ClQT) := {a : K -> R : J a 2 (y)g Y (y)dy < 00, J a(y)g Y (y)dy = 0}. 

(2.10) 

Lemma 2.2. Le£ g be a density satisfying conditions (CI) and (C2). The 
componentwise projection ofl\ on C\(Y) is given by 

IIo(/i|^(y)) = (1 - YX (Y))EZ. (2.11) 

Proof. Because a G C 2 {Y) is one-dimensional and 1% is a fe- vector, all pro- 
jections are taken componentwise. Therefore, in this proof the projection a* 
of li on the space C%(Y) is also regarded as a A;- vector and is calculated by 
solving componentwise 

k-a*±a, Va G (C° 2 (Y)) k . (2.12) 
That is, solve componentwise 

E[(h(X,Z)-a*(Y))a(Y)] (2.13) 
=E E (h(X,Z) -a*(Y)\Y) a(Y) =0, Va G (C° 2 (Y)) k . (2.14) 

It is easy to check that the last equality holds if 

a*(Y) = (l-YX (Y))EZ, (2.15) 

which is indeed an element of {C\{Y)) k by condition (C2). □ 

The right hand side of equation (|2.15j) can be seen to equal the score 
function for scale of the density gy given by (|2.5j) except for a constant. The 
joint density of X and Z, given by (jl.6|) . can be written as a product of this 
density with a function independent of this density as follows 

fe(x,z)=gy(y) E ^l w . (2.16) 

Define the parametric path G v through Go by the scale transformation 

G v (x) = GOre 77 ), rjeR, (2.17) 
and define the parametric submodels of V by 

Q = {Pe, Gv --0ee h ,rieR}, (2.18) 

Qi = {Pe,G ■■ G e h } = V u and (2.19) 

Q2 = {Pe ,G n ■■ V e R}. (2.20) 
Without its simple proof we state the following Lemma. 
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Lemma 2.3. Let g be a density as in Lemma \2. <\ and let Q 2 be defined by 
(I2.20j) . The tangent space Q 2 is generated by the score function l 2 for r\, given 
by 

l 2 (X,Z) = l-FAo(F). (2.21) 
The componentwise projection of l\ on Q 2 is given by 

n (/i|Q 2 ) = (l-YX (Y))EZ, (2.22) 

and hence Q 2 is least favorable. 

These lemmas yield the following theorem. 

Theorem 2.4. Let Q be a convex set of distribution functions G with den- 
sity g satisfying conditions (CI) and (C2) and fix the density h G Ti of the 
covariates such that conditions (C3)-(C5) are fulfilled. Let Y^z denote the 
covariance matrix of Z under P . Then the efficient score function l\ for 9 
( cf- ()2-7|) ) in the model V at P® equals 

l*(X, Z, P \6, V) = -(Z- EZ)Y\ (Y). (2.23) 

The corresponding Fisher information at Pq (cf. (J2.8|) ) equals 

I(Po\6,T)=Z z E(Y\ (Y)) 2 , (2.24) 

and is nonsingular. 

Proof. As has already been mentioned, because Q 2 C V 2 C C 2 {Y) and Lem- 
mas 12121 and EH3 hold, the projection of the score function li on the tangent 
space V 2 is given by ()2.22|) . The theorem follows by ()2.6|) - ()2.8|) . Existence 
and non-singularity of is a consequence of conditions (C3) and (C5) and 
Lemma f2. II Hence fmiteness and nonsingularity of I(Pq\8,V) follow by con- 
dition (C2). □ 

If T n is an arbitrary regular estimator of 9, then its asymptotic covariance 
matrix is at least at large as the information bound for the model V given by 
the inverse of ()2.8|) . according to the convolution theorem. The main aim of 
this paper is to prove the existence of an estimator for which the information 
bound is attained. 

Definition 2.2. An efficient estimator 6 n is an estimator that for all Po E V 
is locally asymptotically regular and normally distributed under Po with 
covariance matrix 7^ 1 (P O |0, V), or equivalently satisfies (cf. ()2.9|0 




under Pq. 

In the next section such an estimator will be constructed. 
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2.2 Existence of an efficient estimator of 6 



In this section it is shown that 9 can be estimated i/n-consistently and, 
assuming knowledge of 9, that the efficient influence function l\ can be es- 
timated -y/ra-unbiasedly and consistently. By Klaassen (1987), existence of 
such estimators is equivalent to existence of an efficient estimator of 9. 

Theorem 2.5. Let model V be given by (j2.1j) . with the density h fixed. 
If conditions (CI) through (C5) are satisfied, then there exists an efficient 
estimator 9 n of 9. 

Proof. The following conditions of Corollary 7.8.1 of BKRW (cf. Klaassen 
(1987)) will be verified: 

1. (Smoothness) 

jfl„ - 9 + ^ [W> Z « 6 ^ G ) - M**> Z * 6 > G )] l = °PeA l )i 

(2.26) 

for all (9, G) and all sequences {9 n } with y/n\6 n — 9\ = 0(1). 

2. (Preliminary estimator of the parameter) There exists a A/n-consistent 
preliminary estimator 9 n of 9. 

3. (v^n-unbiased consistent estimator of the efficent influence function) 
There exists an estimator h(- ,■ ; 9; X., Z_) of , • ; 9, G) satisfying 

j h{x,z;9 n ;X,Z)dP { o n)G) {x,z) = o Pf)nG (l) (2.27) 



and 

k(x, z; 9 n - X, Z) - h(x, z; 9 n , G) |" dP { e n>G) (x, z) = o Pf)n G (l) (2.28) 

for all (9, G) and all sequences 9 n with ^/n\9 n — 9\ = 0(1). 

Because there exists a regular least favorable subfamily of V, namely Q2 given 
by ()2.20|) . the smoothness condition is fulfilled, see Bickel (1982, (6.43), page 
670) or BKRW (2.1.15). 

The condition of a preliminary estimator is also fulfilled. For any h e TC 
the collection of distributions with density fg s z, 9 G Oh, is regular parametric. 
Thus, according to Le Cam (1956), there exists a A/n-consistent estimator 
9 n of 9. Actually fg t z is a density from a full exponential family, hence the 
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maximum likelihood estimator is a moment estimator, and as such asymp- 
totically normal with convergence rate y/n (See e.g. Van der Vaart (1998), 
Chapter 4). 

Now assume that 9 is known. Remember that the density gy of Y (cf. 
(12. 5 j) ) has score for location equal to the baseline hazard function A. The 
next lemma, proven in the appendix, shows that there exist estimators gy 
and A for g Y and A = —g'y/gy, respectively, based on Y 1; ... , Y n , satisfying 

J y 2 (\(y) - X(y)) 2 g Y (y)dy = o Py (l), (2.29) 

h := / y 2 X 2 (y)g Y (y)dy % h . (2.30) 

Lemma 2.6. If the density gy is absolutely continuous on [0, oo), then there 
exist estimators that satisfy conditions ()2.29)1 and (|2.30|) . 

Because the distribution function G is absolutely continuous, by definition 
gy is also absolutely continuous on [0, oo); cf. ()2.5|) . 

Proposition 2.7. Let the estimators gy and A for gy and X, respectively, 
based onY u ...,Y n (cf. (Q ), satisfy ({QSj) and (l2~3TH) . T/ien efficient 
influence function of the cross sectional accelerated failure time model can 
be estimated consistently and \fn-unbiasedly, that is, the third condition of 
Theorem \'2.5\ is satisfied. 



Proof. To get an estimator of the efficient score function I*, plug in the 
estimator of A, so 

ll(X, Z) := -(Z - EZ)YX(Y). (2.31) 

As an estimator of the efficient influence function l\ take (cf. (|2.9j) . ([2.8)1 . 
and (jODl ) 

fa := (/iE^)- 1 /*. (2.32) 
Independence of Z and Y" yields unbiasedness and hence y^-unbiasedness. 
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Consistency is proved by 




2 

h(x, z) — h(x, z) fg(x, z)dxdu(z) 



<2Tr(S z 1 )f-| j (yX(y)-yX(y)y gY (y)dy (2.33) 



1 l^ 2 



+ {l 1 ~h) J lyHy) ^ 9Y ^ dy 
=op Y (l), 

which follows by conditions (C2), ()2.29|) and ()2.30|) . Note that for any vector 
z we may write \z\ 2 = z T z = Tr(zz T ). This proves the proposition for 9 
fixed. However, it still holds for 9 n tending to 6. □ 

An efficient estimator of 9 can now be constructed using sample splitting, 
basing 9 n and l\ on different independent parts of the sample (see e.g. BKRW, 
equation (22), page 396). Such a construction is mainly meant to show the 
existence of an efficient estimator. Though it looks artificial at first sight, 
Klaassen (2001) argues that sample splitting can be considered to be quite 
reasonable. □ 

In practice one will ignore any dependence between 9 n and h, taking 

1 n 

9 n = 9 n + -y^i l (X l} Z i ;9 n ) (2.34) 

i=l 

as an estimator. Efficiency can be proved then, but only under extra condi- 
tions, see Schick (1986). 



3 Unknown distribution of the covariates 
3.1 Model representation 

Assume that the vector of covariates W has an unknown density h with 
respect to the dominating measure v. Extend model (|2.1j) to 

V := {P (e , G ,h) :9ee h ,Geg,heH}. (3.1) 
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For fixed P = P (9o . 

,Go,ho) there are submodels similar to (|2.2j) and ()2,3|). but 
including the fixed h . Additionally define 

V 3 = {P(0 o , Go , h) :heH}. (3.2) 

The tangent space P 3 of P 3 is given by all functions of the covariate vector 
Z that have expectation and finite variance under fe ,z'- 

V 3 = {b:R k ^R: E /#biJf 6(Z) = 0, E Kz b\Z) < 00}, (3.3) 

that is, every function of Z in C^Pq) is a score function of h at ho and vice 
versa. 

To construct an efficient estimator of 6, the same procedure as in subsec- 
tion l2.2l will be used. However, on account of the distribution of the covariate 
vector Z only, 9 and h are not identifiable. Therefore, the collection 7i is 
restricted to the collection TCo of all density functions h of W, such that 0^ 
is nonempty and 

(HI): E h W = 0, 6 G B h , 

{my. E\w\ 2 e eTw < 00, o g e h . 

The submodel Q of V is given by 

Q = {P(0,G,h) :6ee hj Geg,he H }. (3.4) 

Note that E h W = Ef e Ze eTz . Because model Q yields a restriction on the 
core model, it seems to be applicable only if the expectation of the covariate 
vector W is known. 

Take the submodels Qi, Q 2 , and <2 3 of Q similar to Vi,V2, and P 3 re- 
spectively. Let Q 3 be the tangent space of the submodel Q 3 of Q given by 

Qs = {P(6 ,G ,h) : h G Ho}. (3.5) 
Similarly to (|2.7jl the efficient score function l\ for 9 of Q is defined by 

11 =k-U (ix^ + Qs), 

which equals 

/i-n (/i|p 2 ) -n (/i|Q 3 ), (3.6) 

due to the orthogonality of V2 and Q 3 C V3. 
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Let {h v } be a parametric path through h and let b(Z) be the tangent 



(3.7) 

r)=0 

To be a tangent its expectation has to vanish, but by condition (HI), also 
the expectation of Z exp(9 T Z)b(Z) has to vanish. The latter can be seen by 
differentiating the equality 

E V W := / wh v (w)du(w) = (3.8) 

with respect to rj, by calculating at rj — 0, and by then rewriting in terms of 
fe,z- 

Now define the tangent space of <2 3 by 

Qs = {b G C 2 (P ) : Eb(Z) = 0, EZe eTz b(Z) = 0}. (3.9) 

The true tangent space of Q3 could be larger, but complete knowledge of this 
space is unnecessary as long as an efficient estimator for 9 in Q can be found 
with (j3.9|) as tangent space. 

The efficient score function for 9 in Q and the efficient Fisher information 
at Pq for 9 for given densities Go and ho are presented in the next lemma, 
the proof of which will be given in Section 0J 

Lemma 3.1. Let Y, z be given by (C3). By flZZfl , (J2U), (J3U, con(/«- 

tion (HI) an<i t/ie /act that E(Y X (Y)) = 1, the efficient score function for 9 
in model Q equals 

il(x, z, p Q \e, Q) = -(z - ez){\ + YXo(y)) 

+ E(ZZ T e eTz ) [EZZ T e 2eTz y 1 Ze QTz . (3.10) 

By ()2.8|) and (|3.10|) the corresponding Fisher information at P equals 

I(P Q \6, Q) = Tj Z E(l + YX Q (Y)) 2 

+ E(ZZ T e 0Tz ) [EZZ T e 20Tz y l E(ZZ T e 0Tz ). (3.11) 

3.2 Existence of an efficient estimator of 6 

The following theorem will be shown to hold. 



b{Z) :=— logh v {Z) 
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Theorem 3.2. Let model Q be given by (13. 4 j) . If conditions (C1)-(C5) and 
conditions (HI) and (H2) are satisfied, then there exists an efficient estimator 
k of 9. 

Proof. To prove this theorem it suffices, as in Theorem 12 ,5[ to show the exis- 
tence of a i/n-consistent estimator of 9 and the existence of a A/n-unbiased, 
consistent estimator of the efficient influence function. The proof of the exis- 
tence of a consistent estimator of the efficient influence function is similar to 
that of Section EIH (cf. Lemma f3.4|) . However, the proof of the existence of a 
\/n-consistent estimator of 9 will differ completely, because the density fe,z 
of the covariate vector Z is not parametric anymore. Although Z follows a 
semiparametric model, it is still easier to base the \/n-consistent estimator 
of 9 on the covariates only than on the whole sample. Under conditions (HI) 
and (H2) an M-estimator of 9 based on the covariates is constructed in the 
proof of the next lemma. 

Lemma 3.3. Within model Q there exists a \fn- consistent estimator 9 n of 
9. 

Proof. Define W(9, P) = J ze eTz dP(x, z) and let 9{P) be the 9 corresponding 
to P. For every 9, (ll.7|) and (HI) imply 

W(e(P),P) = ^^ = 0, (3.12) 

because h belongs to H. Q . Let P n be the empirical distribution. Define the 
M-estimator 9 n as the root of the equation 

1 " 

H/(fl,P n ) = -VV^=0. (3.13) 
n ' 

i=l 

The conditions of Theorem 7.4.2 of BKRW can be verified to hold, that is 

1. There exists 9 : Q -> R k , such that W{9{P), P) = 0, VP G Q. 

2. W(-, P) = has a unique solution in for all 
P G Q|J{all realizations of P n , n > 1}. 

3. W(-, P) is differentiable with derivative W{9, P) = [d/d9 j W t {9, P)] kxk 
and W(9(P), P) is nonsingular. 

4. VteR k y/n~{W(9(P) +n- 1 / 2 t,F n ) -W(9(P),¥ n )} = W(0(P),P)t+ 
op(1). 

5. W(9(P),¥ n ) = j ze eT ( p >dF n {x,z) + opin- 1 / 2 ) and Ze eT ^ z G C 2 {P). 
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The first condition is obvious from (|3.12j) . The second condition holds for 
all P G Q, because W(6, P) is strictly increasing in the components of 9 and 
therefore W(9, P) = has a unique solution given by 9 = 9(P). However, this 
condition does not have to hold for all realizations Z± G 7Z k , i = 1, . . . , n of 
P„. To see this, assume there exists an index j, such that Zy > (or < 0) for 
all i = 1, . . . , n. Then the j-th component of W (9, P n ) does not pass through 
zero and hence ()3.13|) has no solution. But asymptotically this problem 
disappears, because for large samples the probability that this happens goes 
to zero. The derivative W(G,P) of W(-,P) equals / zz T e e z dP(x, z), so 
W(9(P), P) = Ef g ZZ T e eTz = Evk is non-singular by condition (C3). Thus 
the third condition is fulfilled. For the fourth condition let 9 = 9(P), then 

V^{W(9 + n-^t, P n ) - W(9 , P n )} 
i n 

=J_ Z^{f z ^ - 1) (3.14) 
* i=i 

1 n i 

n Jn 
i=i v 

=W(P)t + o P (l). 

By definition of W(9, P) the op(n -1 / 2 ) term in the last condition vanishes. 
The second part of condition 5 is just Condition (H2). 

Theorem 7.4.2 of BKRW now states that 9 n is a unique asymptotically 
linear estimator of 9, which implies A/ri-consistency. □ 

Lemma f2.6l still holds for model Q, and the estimators gy and A are the 
same as for the model in Section 12.21 because the distribution of Y does not 
depend on Z, but only on g. This leads to the next lemma, which will also 
be proved in Section |U 

Lemma 3.4. There exist estimators I* for (|3.1()jl and I for (|3.11j) such that 
l\ = I l\ is a y/n-unbiased consistent estimator of the efficient influence 
function li = J -1 /^. 

□ 

So, as in Section POI by Corollary 7.8.1 of BKRW an efficient estimator 
of 9 exists and can be constructed using sample splitting. 

4 Remaining proofs 

Proof of Lemma 12. 6L 

Let Yi, . . . , Y n be i.i.d random variables with density gy, which is absolutely 
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continuous on [0, oo). Define X±, . . . , X n i.i.d. with 

Xi:=YiBi Vi G {1, . . . , n}, (4.1) 
where B\, . . . , B n are i.i.d. Bernoulli, independent of Yi, . . . , Y n such that 

P(Bi = -l)=P(Bi = l) = l. (4.2) 



Then X\ , . . . , X n have density 



g(%) = 2 {9y(x)I[ x > ] + g Y (-x)I [x<0 ]) , xGl. 
Note that g is absolutely continuous on M with 



h{g) 



,2k ( 9_ 
9 



X 



(x)g(x)dx = x X (x)gY(x)dx, 
'o 



(4.3) 



(4.4) 



which is finite for k — 1. Then, by Proposition 7.8.1 of BKRW there exists 
an estimator hx(x; X\, . . . , X n ) of xg'jg{x) satisfying 



hi(x) — x—(x) 
9 



g(x)dx — > 0. 



(4.5) 



Define the randomized estimator h\(x) based on Y1, . . . , Y n by 

1 



hi(x) = ^ ( \hi(x)\ + \h\{— x)\ ) , x > 0. 



(4.6) 



Then 



hi(x) — x—(x) 
9y 



gy(x)dx < 



hi(x) — x—(x) 
9 



g{x)d(x) = o P (l), 
(4.7) 



so —hi(x) is a consistent estimator for x\(x) = —xg' Y /gY{x). The proof for 
existence of an estimator gy of gy so that ()2.30|) is satisfied is the same as in 
the proof of Proposition 7.8.1 of BKRW. 

Proof of Lemma 13.11 

As in the proof of Theorem 12.41 all projections are taken componentwise. 
Because Q3 C V3 the projection formula 



n Q (/i|Q 3 ) = n Q n (/'i|P 3 )|Q. 



(4.8) 
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holds. As in (|2.14j) it can be seen by ()2.6|) that 

U (h\V 3 ) = E(k\Z) = Z-EZ (4.9) 

holds. Because E(Z — EZ) = 0, the projection of Z — EZ on Q3 is given by its 
projection on the set of all functions b(Z) for which EiZ exp(# T Z)b(Z)) = 
0. This is just its projection on the set of all functions perpendicular to 
Zexp(9 T Z), denoted by [Z ex V {6 T Z)] L . So 

n (Z - EZ|Q 3 ) = MZ - EZ\[Ze eTz }^) 

= Z -EZ -Tl (Z -EZ\[Ze dTz ]). (4.10) 

By condition (HI) the last projection can be seen to equal 

U (Z - EZ\[Ze 9Tz ]) = E(ZZ T e eTz ) \e{ZZ t e 2eT ^ Ze QTz (4.11) 

The efficient score function follows by combining Ij3.fi J) . (j2.23j) . and (|4.8jl - 
(|4.11j) . Finally the efficient Fisher information is obtained from (j2.8|) and 

(Em. 

Proof of Lemma 13. 4L 

As estimators for the population means EZ, Mi := E(ZZ T e e z ) and M 2 : = 
E(ZZ T e 2e z ) take their respective sample means Z n , M\ and M^. Let the 
sample variance S\ be the estimator for the population variance T, z , and let 
Ao, c/y, and I\ be the estimators mentioned in Proposition 12.71 and Lemma 
12.61 As an estimator of l\ from (|H.l()j) take 

i{ = -{Z - Z n )(l + YX (Y)) + M x M2 1 Ze (,Tz (4.12) 
and as an estimator of / take 

J = S 2 z E Y (l + YX (Y)) 2 + .\/i.\/ 2 '.\/ : . (4.13) 
Then let l\ = be the estimator of the efficient influence function 1% = 

From the independence of Y and Z it follows that l-y is -y^-unbiased. 
The squared Euclidean norm of the difference between the efficient influence 
function and its estimator can be shown to equal 

\h-h\ 2 (4.14) 
r l (Z n - EZ)(1 + Y\ (Y)) + t\Z - EZ)(Y\ (Y) - YX (Y)) 

+ /^(MiMf 1 - M 1 M^ 1 )Ze f)Tz + (t 1 - r l )l\ 2 . (4.15) 
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So the expectation of (|4.14jl is bounded from above by 

E(|Z"! -Z~i| 2 ) (4.16) 

< A^Tr (/^(Var^) J- 1 ) E(l + Y\ (Y)) 2 (4.17) 

+ Tr (l- 1 !^/- 1 ) E(FA (r) - YA (F)) 2 (4.18) 

+ Tr (j-^MxMz 1 - M 1 M^ l )M 2 (M l M^ 1 - M 1 M 2 _1 )/ _1 j (4.19) 

+ Trf(/- 1 -/- 1 )/(/- 1 -/- 1 ) N ) ) (4.20) 



By nonsingularity of / and finiteness of ~E{Y\q(Y)) 2 and ()2.29j) the ex- 
pressions ()4.17J) - (j4.20|) and thus ()4.16|) are o p (l). This completes the proof. 

References 

[1] Bickel, P.J. (1982), On adaptive estimation, Ann. Statist. 10 647-671 

[2] Bickel, P.J., Klaassen, C.A.J., Ritov, Y., and Wellner, J.A. (1993), Ef- 
ficient and adaptive estimation for semiparametric models, Springer- 
Verlag, New York. 

[3] Cox, D.R. (1972), Regression models and life-tables (including discus- 
sion), J. Roy. Statist. Soc. Ser. B 34 187-220. 

[4] Es, B. van, Klaassen, C.A.J., and Oudshoorn, K. (2000), Survival anal- 
ysis under cross sectional sampling, J. Statist. Plann.Inf. 91 295-312. 

[5] Kalbfleisch, J.D. and Prentice, R.L. (1980), The statistical analysis of 
failure time data, Wiley, New York. 

[6] Klaassen, C.A.J. (1987), Consistent estimation of the influence function 
of locally asymptotically linear estimators, Ann. Statist. 15 1548-1562. 

[7] Klaassen, C.A.J. (1989), Efficient estimation in the Cox model for sur- 
vival data, Proc. Fourth Prague Symp. Asympt. Statist., P. Mandl and 
M. Huskova (eds.), Charles University, Prague, 313-319. 

[8] Klaassen, C.A.J. (2001), Discussion of a paper by Bickel, P.J. and Kwon, 
J., Inference for semiparametric models: some questions and an answer, 
Statist. Sin. 11 863-960. 



18 



[9] Le Cam, L. (1956), On the asymptotic theory of estimation and test- 
ing hypotheses, Proc. Third Berkeley Symp. Math. Statist. Proh., J. 
Neyman (ed.), University of California Press, Berkeley, 1 129-156. 

[10] Reid, N. (1994), A conversation with Sir David Cox, Statist. Sci. 9 439- 
455. 

[11] Schick, A. (1986), On asymptotically efficient estimation in semipara- 
metric models, Ann. Statist. 14 1139-1151. 

[12] Tsiatis, A. A. (1981), A large sample study of Cox's regression model, 
Ann. Statist. 9 93-108. 

[13] Vaart, A. W. van der (1998), Asymptotic statistics, Cambridge Univer- 
sity Press, Cambridge. 



19 



